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Abstract. Maximum likelihood estimation is a valuable tool often applied to inverse problems 
in quantum theory. Estimation from small data sets can, however, have non unique solutions. We 
discuss this problem and propose to use Jaynes maximum entropy principle to single out the most 
unbiased maximum-likelihood guess. 



INTRODUCTION 

The role of the variational principles in science can hardly be overemphasized. Max- 
imization or minimization of the appropriate functionals provides elegant solutions of 
rather complicated problems and contributes to the deeper philosophical understanding 
of the laws of Nature. Minimization of the optical path-Fermat principle or minimization 
of the action-Hamilton principle are two particular examples of such a treatment in op- 
tics and classical mechanics, respectively. In the thermodynamics and statistical physics 
an appropriate measure which deserves to be maximized was introduced by Boltzmann 
as entropy S = — ^logT, where T denotes the volume in the phase space or the number 
of distinguishable states. The role of entropy as uncertainty measure in communication 
and information theory was recognized by Shannon. His definition S = — Y. n Pn^ogp n 
is unique in the sense that fulfills reasonable demands put on the information measure 
associated with a probability distribution p n . Particularly, the uncertainty is maximized 
when all the outcomes are equally likely- the uniform distribution contains the largest 
amount of uncertainty. Its implications for physical and technical practice were noticed 
by Jaynes fl], who proposed a variational method known as principle of Maximum En- 
tropy (MaxEnt). According to this rule one should select such a probability distribution 
which fulfills given constraints and simultaneously maximizes Shannon entropy. This 
gives the most unbiased solution of the problem consistent with the given observations. 
On the philosophical level this corresponds to the celebrated Laplace's Principle of In- 
sufficient Reasoning. It states that if there is no reason to prefer among several possibil- 
ities, than the best strategy is to consider them as equally likely and pick up the average. 
This strategy appeared to be extremely useful in many applications covering the fields 
of statistical inference, communication problems or pattern recognition yfl. 

But entropy is not the only important functional in probability theory The entropic 
measure known as Kullback-Leibler divergence y|] or relative entropy E ({/?;} = 
Y,iPd°g(Pi/ bears striking resemblance to the Shannon entropy, however it posses 
a different interpretation. It quantifies the distance in the statistical sense between two 
different distributions pi and Provided that one party (p, in our notation) are the 
sampled relative frequencies, the principle of minimum relative entropy coincides with 



the maximum likelihood (MaxLik) estimation problem JUm. Similarly to the previous 
case of MaxEnt principle MaxLik is not a rule that requires justification - it does not 
need to be proved. At present there are many examples of successful application of this 
estimation technique for solving inverse problems, or recently, for quantification of such 
a fragile effect as entanglement. 

Though both the celebrated principles, MaxEnt and MaxLik, rely on the notion of 
entropy, their usage and interpretation differ substantially. Whereas the former one pro- 
vides the most conservative guess still consistent with the data, the later one is the most 
optimistic one fitting the given data in the best possible way However, both the 

methods are suffering by certain drawbacks: MaxLik is capable of dealing with counted 
noisy data in realistic experiments but its interpretation usually requires a certain cut-off 
in the parameter space. Otherwise the solution may appear us under-determined. On the 
other hand, the MaxEnt principle removes this ambiguity by selecting the most unbiased 
solution, however realistic data may appear as inconsistent due to the fluctuations, and 
cannot be straightforwardly used as constraints. The purpose of this contribution is to 
unify both these concepts into a single estimation procedure capable of handling any 
data, and to provide the most likely and most unbiased solution without any cut-offs. 



MAXIMUM-LIKELIHOOD QUANTUM-STATE 
RECONSTRUCTION 

To address the problem of quantum state reconstruction 

JE Hi HE dun 

let us 

consider a generic quantum measurement. The formulation will be developed for the 
case of finite dimensional quantum systems. The reader can think of a spin 1/2 system 
for the sake of simplicity. 

Assume that we are given a finite number N of identical samples of the system, each 
in the same but unknown quantum state described by the density operator p. Given those 
systems our task is to identify the unknown true state p as accurately as possible from 
the results of measurements performed on them. 

On most general level any set of measurements can be represented by a Probability 
Operator Valued Measure (POVM), {n y }, j = 1 . . .M. Its elements are semi-positive 
definite operators that sum up to unity operator, TIj > 0, V j, Y,j 11^ = 1. The last re- 
quirement is simply the consequence of the conservation of probability: The measured 
particle is always detected in one of the M output channels, no particles are lost. 

Let us assume, for concreteness, that N particles prepared in the same state have been 
observed in M different output channels of the measurement apparatus. For spin 1/2 
particles those channels could be for instance the six output channels of a Stern-Gerlach 
apparatus subsequently oriented along x, y, and z directions. 

Provided that each particular output 

Uj, j=l,...,M (1) 

has been registered nj times, Y,j n j = N, the relative frequencies are given as fj = rtj/N. 
Using this data, the true state p is to be inferred. The probabilities of occurrences of 



various outcomes are predicted by quantum mechanics as 



Pj = TTpUj, j = l...M (2) 

If the probabilities pj of getting a sufficient number of different outcomes II/ were 
known, it would be possible to determine the true state p directly by inverting the 
linear relation ©. This is the philosophy behind the "standard" quantum tomographic 
techniques lata]. For example, in the rather trivial case of a spin one half particle, the 
probabilities of getting three linearly independent projectors determine the unknown 
state uniquely. Here, however, a serious problem arises. Since only a finite number of 
systems can be investigated, there is no way how to find out those probabilities. The 
only data one has at his or her disposal are the relative frequencies fj, which sample 
the principally unknowable probabilities pj. It is obvious that for a small number of 
runs, the true probabilities pj and the corresponding detected frequencies fj may differ 
substantially. As a result of this, the modified realistic problem, 

/y = Trn ; p (3) 

has generally no solution on the space of semi-positive definite hermitian operators 
describing physical states. This linear equation for the unknown density matrix may 
be solved for example by means of pattern functions, see e.g. HIlO], what could be 
considered as a typical example of the standard approach suffering from the above 
mentioned drawbacks. 

Having measurements done and their results registered, the experimenter's knowledge 
about the measured system is increased. Since quantum theory is probabilistic, it has lit- 
tle sense to ask the question: "What quantum state is determined by that measurement?" 
More appropriate question is EEL EH]: ' 'What quantum states seem to be most 



likely for that measurement?" 

Quantum theory predicts the probabilities of individual detections, see Eq. ©• From 
them one can construct the total joint probability of registering data {rij}. We assume 
that the input system (particle) is always detected in one of M output channels, and this 
is repeated N times. Subsequently, the overall statistics of the experiment is multinomial, 

^(p) = f ^7n[Tr(pn y )]'^ (4) 

where nj = Nfj denotes the rate of registering a particular outcome j. In the following 
we will omit the multinomial factor from expression @ as it has no influence on 
the results. Physically, the quantum state reconstruction corresponds to a synthesis of 
various measurements done under different experimental conditions, performed on the 
ensemble of identically prepared systems. For example, the measurement might be 
subsequent recording of an unknown spin of the neutron (polarization of the photon) 
using different settings of the Stern Gerlach apparatus, or the recording of the quadrature 
operator of light in rotated frames in quantum homodyne tomography. The likelihood 
functional ££{p) quantifies the degree of belief in the hypothesis that for a particular data 
set {nj} the system was prepared in the quantum state p. The MaxLik estimation simply 
selects the state for which the likelihood attains its maximum value on the manifold of 
density matrices. 



To make the mathematics simpler we will maximize the logarithm of the likelihood 
functional, 

Loc^fjlogpj, (5) 

j 

rather then Jzf itself. Notice that L is a convex functional, 

L[api + (1 - a)p 2 ] > aL(pi) + (1 - a)L(p 2 ), (6) 

defined on the convex set of semi-positive definite density matrices p, p > 0, Trp = 1. 
This ensures that there is a single global maximum or at most a closed set of equally 
likely quantum states. 

The direct application of the variational principle to likelihood functional together 
with the convexity property yield the necessaryand sufficient condition for its maximum 
in the form of a nonlinear operator equation II 1 VL 1 1 811 - 

Rp=p, (7) 

where 

is a semi-positive definite operator. In particular, R is unity operator provided the 
maximum-likelihood solution is strictly positive. 

Let us now consider a tomographically incomplete measurement. In such a case, the 
inverse problem might have multiple solutions. This will happen, for example, when the 
set of normalized Hermitian operators o satisfying the constraints Pj(o) = fj, Vj has a 
nonempty intersection with the set of density matrices. As will be illustrated below, two 
equally-likely solutions of an under-determined inverse problem can be very different. 
It is then a question which maximum-likely state should be picked up as the estimate of 
the true state. 

We propose to use Jaynes MaxEnt principle to resolve this ambiguity. Information 
content of the set of MaxLik solutions can be quantified according to their entropy. A 
natural choice is then to select the state maximizing the entropy, which is the least biased 
state with respect to missing measurements. 

Let us assume that there are two different density operators pi and p2 maximizing the 
likelihood functional. The two operators satisfy the extremal equations Q, 

R{p2)Pl = P2- 

The interpretation of the operator R is the following: Denoting /(p, a, a) = L[(l — 
a)p + oca] the likelihood of a convex combination of states p and o, and calculating its 
path derivative at p, 

A /( p, g , g) = lim /(p^)-/(p^ > 

da ^ ' a^o a (10) 

= Tr[#(p)<7]-l, 



we see that this derivative is given by the expectation value of R(p) taken with a. 
Expectation values of operator R(p) define hyperplanes perpendicular to the gradient 
of the likelihood at p. 

Since the likelihood cannot be increased by moving from p\ toward p2 and vice versa 
(both density matrices are maximum likely states) it follows that 

Tr[R(p l )p 2 ]=Tr[R(p 2 )pi] = l. (11) 

Expressing the two conditions in terms of probabilities p\j and p 2 j generated by p\ 
and p2, respectively we get 



7 w 

which upon summing the left-hand sides yields condition 

■2 i „2 



(12) 



y f Plj+P2j _ l 
j ' 2 PljP2j 



(13) 



Now since {p\- + p\j) / {IpijPij) > 1 unless pij = p 2 j we obtain, 

Tr Pl n 7 = Trp 2 n y , Vj (14) 

which means that the probabilities, and so the operators R{p\) and R(p 2 ) are identical. 
The two extremal equations therefore read, 

f 1=Pl ' (15) 

RP2=P2- 

Notice that both p\ and p 2 commute with the common generator R. 



MAXIMIZATION OF ENTROPY 

Having found a maximum of the likelihood functional, we still do not know whether 
this solution is unique or not. Provided a closed set of such states exists, we would 
like to maximize the entropy functional over it. In this way we will get the least biased 
maximum-likelihood guess. 

The properties of the maximum-likelihood solutions discussed above simplify this 
problem a lot, because we know that all density matrices belonging to the maximum 
likely set generate the same probabilities. 

We will take those probabilities as constraints of the new optimization problem: 
Maximize entropy, 

£(p) = -Tr(plnp), (16) 



subject to constraints 

Tr(pn j )=Tr(p ML n ; ), j = 0...M, (17) 

where Pml is a maximum likely state and where we defined Ilo = 1 to keep the normal- 
ization of the estimated state. 

Problem Eq. (IT6l) and (ITvb is known to have a unique solution [1], 



Pme = exp 



j = 0...M, (18) 



where Lagrange multipliers Xj can be determined from the constraints. 

The proposed approach combines good features of maximum-likelihood and 
maximum-entropy methods. From the set of density matrices that are most consis- 
tent with the observed data in the sense of maximum likelihood we select the least 
biased one. At the same time the positivity, and thus also physical soundness, of the 
result is guaranteed. 

Let us remind the reader that a direct application of the maximum entropy principle 
to raw data (i.e. right hand sides in constraints Eq. (flTt replaced by ff) is not possible, 
because the constraints often cannot be satisfied with any semi-positive density operators 
due to the unavoidable presence of noise in the data. 

For the rest of the paper let us restrict ourselves to the most simple case of commuting 
measurements [II/,IIjt] = 0, Vj,fc. Such tomographic scheme would correspond to the 
measurement of diagonal elements of the true density matrix. Although this may seem as 
an oversimplification, many inverse problems can be reduced to this form. Let us men- 
tion the neutron absorption tomography, or inefficient photo detection as two significant 
examples. 



EXAMPLE 

We will illustrate the proposed reconstruction scheme on a simple example of commut- 
ing measurements. Denoting the common eigenbasis of POVM elements {II/} 
we have that 

L^ n J = L r i\i)(i\- d9) 

j i 

The maximum entropy solution Eq. (fTSl will assume a diagonal form in this basis, its 
eigenvalues being, 

(*'IPme|0 = exp[Y^Xj(i\Uj\i)] . (20) 

j 

Denoting p, = (i\p\i) and Cjj = (i\Uj\i) we finally get a nonlinear system of equations, 

J^/VVcy I'V/Px,,., , (21) 

i i 

that is to be solved for the unknown M + 1 Lagrange multipliers A/. 





FIGURE 2. Randomly generated matrix ctj parameterizing a three element POVM. 



A particular true six-dimensional vector p trU e,i is shown in Fig. [TJ In a simulated 
experiment this state has been subject to randomly generated three element POVM; its 
elements c,j in the common diagonalizing basis are shown in Fig.|2j 

The probabilities of observing results j = 1 , 2, 3 are as follows: pj = c, ; pt rU e,/- They 
are shown in Fig. |3] for our particular choice of p true and c (J . Taking the probabilities as 
the input data, we solved the maximum- likelihood extremal equation iteratively starting 
from three different strictly positive density matrices. It is worth noticing that a quantum 
state reconstruction from compatible observations is a linear and positive problem. In 
this case the operator equation © reduces to a simple diagonal form which is suitable 
to iterative solving. This algorithm is sometimes called the expectation maximization 
algorithm in statistical literature and is known to converge monotonically from any 
strictly positive initial point W19L 12011 . 

As we can see, the three maximum-likely estimates represent very different system 
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FIGURE 3. Relative frequencies of the outcomes of a thought tomographic measurement. For the true 
state and POVM elements, see Figs-^andEl 
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FIGURE 4. Three particular maximum-likelihood estimates based on the result of the though experi- 
ment described in the text. 



configurations. The simple POVM used was too rough to resolve those differences, and 
as a consequence, all the estimated states yield exactly the same probabilities of the three 
possible outcomes of the measurement. 

In the next step, those probabilities were used as constraints for the entropy maxi- 
mization, as we have discussed in the previous section. As a result, a unique state was 
selected out of the set of maximum-likely states. The result is shown in Fig. Notice 
that this state is a good approximation to the original state of Fig. [T] Even though the 
two are smoothed out a bit, they can be clearly recognized in the reconstruction. 



CONCLUSION 



We have demonstrated the utility of the maximum-entropy principle for tomographically 
incomplete quantum state reconstruction schemes. Although the entropic principles 
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FIGURE 5. The result of entropy maximization over the set of maximum-likelihood estimates (three of 
which are shown in Fig.|4]) 

cannot be directly applied to noisy experimental data due to the positivity of quantum 
states, they can be used to remove the ambiguity of maximum likelihood estimation. 
The proposed method could find applications in quantum homodyne detection and other 
related infinite-dimensional problems suffering from the lack of experimental data. 
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